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ABSTRACT 
Identifying foreground objects in an image is one of the most common 
operations used in image processing. In this work, Mask R-CNN algorithm is 
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used to identify solar photovoltaic (PV) panels in aerial images and create a Background Object in ay 
mask that can be used to remove the background from the images. This allows Images" Published in 
processing the PV panels separately. Using ML to solve this problem can International Journal 
generate more accurate results in comparison to more traditional image of Trend in Scientific 
processing techniques like using edge detection or Gaussian filtering Research and 
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the PV panels and selecting the pixels that belong to them while ignoring the 
background pixels. This kind of work can be useful in collecting information 
about PV installation present in aerial or satellite imagery, or in analyzing the 
health and integrity of PV modules in large-scale installations e.g., in a solar 
power plant. The results show that this method is effective with a high 
potential for improved results if the model is trained using larger and more 
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diverse datasets. 
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INTRODUCTION 

There is a growing interest in high quality information about 
small scale solar power (Photovoltaic, PV) installations 
among governments, agencies, and decision makers in order 
to provide better estimates of the growth in power demands 
and trends in renewable energy use. Currently, statistics on 
the use of solar energy are based on data from the 
importation and sales of PV panels. This methodology can 
only give rough estimates, and cannot keep track of quick 
local transitions. On the other hand, detecting PV panels in 
imagery collected by drones in larger scale installations 
helps process the images to find any faults or damages in the 
system. 


The problem of identifying objects of interest in an image 
and isolating them from the background can be solved using 
numerous methods. The most obvious route would be to try 
using an edge detection algorithm in combination with some 
pre-processing for noise reduction and follow that with 
some steps to separate the foreground from the rest of the 
image. Processing certain types of images using filters and 
other mathematical edge detection and pixel manipulation 
techniques can be tricky due to the nature of the scene and 
what kind of imaging conditions and equipment are used. 


The popularity of deep learning use in image processing has 
been growing and it has been applied in solving problems 
like road detection[1-3], scene labeling [4], vehicle 
detection[5], detection of people[6], and detection of 
buildings[7]. Convolutional Neural Networks (CNNs) were 
used in previous works for the task of classification and 
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detection of solar panels, where CNNs produced the best 
results [8]. 


ML algorithms can provide crisp edges and locate the object 
of interest in various views often resulting in much better 
outcomes compared to other image processing methods. 
Most notably, they surpass the traditional techniques when 
the objects of interest are surrounded by other background 
objects in different types of environments, i.e., ifthe detected 
objects are small compared to other objects in the view, or if 
the studied images have different attributes like wide 
difference levels of brightness and contrast, different 
imaging resolution, etc. 


In this work, one of the ML algorithms, Mask R-CNN[9], is 
investigated to determine its suitability and effectiveness in 
identifying photovoltaic modules in aerial photographs 
taken by a drone flying over a power plant installation. The 
dataset of images used in this study was collected and made 
available online by SenseFly systems, using their drone eBee 
Classic[10]. A number of other available datasets are 
reviewed in a report by Curier et al., which can be useful for 
developing studies in this area [11]. 


Table 1, Technical data about the studied dataset 


14.23 cm/px (5.6 in/px 
0.08 square km (0.03 sq. mi) 
Flight height 70 m (229.6 ft) 


Number of images | 1075 
TIFF 
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Applying Mask R-CNN 

During our work on processing a dataset of PV panel images 
for fault detection, there was a need for treating the PV panel 
areas in isolation from the rest of the image pixels. This 
allowed better results and less noise in the output since all 
interactions with background objects are eliminated. Mask 
R-CNN algorithm was selected for this application and its 
results were evaluated. 





C 
Figure 1 One of the processed images, the respective 
generated mask, result of multiplication 


Mask R-CNN is a deep learning algorithm designed to detect 
objects in an image and create a segmentation mask for each 
identified object. The algorithm uses Convolutional Neural 
Networks (CNNs) as a backbone. Such networks are widely 
used to perform image classification and recognition, such as 
face recognition or medical diagnosis. 


Some of the computer vision tasks that can be solved using 

CNNs: 

> Classification: does the object of interest appear in the 
image? 

> Object detection: how many objects are there and what 
are their positions? 

>» Semantic segmentation: which pixels belong to objects 
of interest. 

> Instance segmentation: determines the pixels for each of 
the object instances. 
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Figure 2 Comparison between semantic and instance 
segmentation. Original image (A), semantic 
segmentation result (B), instance segmentation result 


(C) 


The problem of removing the background can be solved 
using semantic segmentation or sample segmentation 
methods. By using deep learning for this step, more reliable 
results compared to traditional image processing techniques 
were achieved. It also made the software more useful when 
processing image datasets captured under different 
conditions, or when using different types of imaging 
equipment, as this causes image properties to change. CNNs, 
on the other hand, tend to give better results and are better 
equipped to solve this problem. 


Various algorithms have been considered during the 
development of our application. Among the algorithms 
investigated: Fast R-CNN [12], deep image matting[13], [14] 
and other background removal studies [15], [16]. Canny 
edge detection and Sobel filter-based methods were also 
considered. 


Mask R-CNN [9] was preferred over other algorithms for the 

following reasons: 

> Multiple open-source applications are available and 
ready to use; 

>» Ease of use as the algorithm is well explained and 
documented; 

> Training time is short; 

> Its results are superior to other algorithms; 


A subset of the complete dataset was selected from the 
original dataset to train the segmentation model. This new 
dataset is split into two groups: a training set used to train 
the model, and a validation set to adjust the model. 


Both the training and validation set consist of: 
1. Real images themselves (in original condition). 
2. PV module mask corresponding to the fields of PVMs. 


Masks used in the training and validation of the model were 
created manually by a human annotator and stored as PNG 
images where pixels have only two values: 0 for background 
pixels, and 1 for PV pixels. The masks are later converted to 
JSON annotations that can be loaded to be used to train the 
Mask R-CNN model. 
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Figure 3 pemplel images from the studied dataset along with their miasies 


Training the model 

To train and evaluate the ML model, Detectron2 [17] was 
chosen as an implementation of Mask R-CNN in this study. It 
is a modern open-source software system that enhances 
previous deep learning models and provides a large number 
of modern ML algorithms implementations ready for 
training and use. 


To prepare the dataset for training and evaluation, masks 
have been transformed into a custom JSON annotation that 
can be read by Detectron2 and used directly. This annotation 
was developed as part of the Microsoft COCO dataset [18]. 
This format is supported by many deep learning libraries, 
making it the current de facto standard for image 
segmentation datasets. JSON annotations were generated 
from mask files stored as PNG images using a tool called 
pycococreator[19]. 





if ; i eee 
Figure 4 Sample of the training input data 


After preparing the training images and JSON annotations, 
the training is executed on the GPU to minimize training 
time. Google Colab[20]is used for this step. Colab is a free 
service that allows running machine learning code written in 
Python on servers equipped with NVIDIA Tesla K80 or P100 
GPUs. 


To achieve higher accuracy from the model using a small 
training set, transfer learning|[21] was used. This means that 
instead of starting the training process from scratch, the 
training starts from a pre-trained model on a different 
dataset. This is useful because the model will know what to 
look for even if it cannot yet define our custom object 
classes. 


The starting model selected is the R50-FPN Mask R-CNN 
model that was pre-trained on COCO dataset. A new "pvm" 
class was added and the model was tweaked so that it can 
recognize these objects and create segmentation masks for 
them. 


Experimental results 

The performance of the machine learning algorithm is 
evaluated using a test dataset during the training phase. The 
resulting Average Precision value is available in table 2. This 
metric shows how good the algorithm is when finding masks 
for PVM class objects. More performance metrics results are 
provided in table 3, which indicate the accuracy, sensitivity 
and recall of the trained model. 


The images used for both training and testing the models 
were Selected from the original dataset so that blurry images 
and identical frames or almost identical frames were 
eliminated. Images that did not contain PV panels at all were 
also not used in the training or validation of the model. 


Table 2, Bounding box Average Precision per category 


Table 3, Performance measurements of the trained 
model 


0.774 0.877 
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Two examples of pictures used to evaluate our educated 
model are shown in. 





Le 


=——- 
7’ 


wae 
4 





| . 
+ i | - 
Figure 5 Sample result of Mask R-CNN model 
validation 


Conclusion 

Using ML algorithms in image segmentation and object 
isolation from the background can be more useful and 
accurate than traditional algorithms in manyuse cases 
depending on the type and nature of the processed images. 
The advances in both hardware and ML algorithms and 
libraries allow applying these novel techniques in solving 
older problems resulting in more accurate outputs. Trained 
models on large training datasets can identify objects of 
interest in different environments and under various 
imaging conditions, compared to traditional image 
processing which usually assumes certain conditions that 
must be met for the algorithm to give optimal results. The 
results show that the selected algorithm is effective and 
accurate for this task. The performance measures 
demonstrate that the algorithm could detect most of the true 
PV pixels while avoiding background pixels successfully. The 
model was only trained using a small dataset which 
contained images taken using the same equipment under the 
same environmental conditions. Better results are expected 
to be achieved when using larger datasets with more diverse 
images to train and validate the model. This work was used 
in part to help assess large-scale PV installations and detect 
faults and malfunctions. It can also be useful in information 
gathering applications where PV panels are detected in 
satellite and aerial images. 
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